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Abstract 

In this work we consider a class of uncertainty quantification problems where the 
system performance or reliability is characterized by a scalar parameter y. The 
performance parameter y is random due to the presence of various sources of un¬ 
certainty in the system, and our goal is to estimate the probability density function 
(PDF) of y. We propose to use the multicanonical Monte Carlo (MMC) method, 
a special type of adaptive importance sampling algorithm, to compute the PDF 
of interest. Moreover, we develop an adaptive algorithm to construct local Gaus¬ 
sian process surrogates to further accelerate the MMC iterations. With numerical 
examples we demonstrate that the proposed method can achieve several orders of 
magnitudes of speedup over the standard Monte Carlo method. 
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1 Introduction 


Uncertainty is an inevitable feature of real-world engineering systems. In those 
systems uncertainty can rise from various of sources: material properties, geo¬ 
metric parameters, boundary conditions, applied loadings and so on. In prac¬ 
tice, it is essentially important to characterize and quantify the impact of the 
uncertainties on the system performances, which constitutes a central task 
of the newly emerging held of Uncertainty Quantihcation (UQ). To be spe- 
cihc, we consider the UQ problems in the following setting. We assume that 
the system is (formally) characterized by a performance function y = (y'(x), 
where the input x is a random vector collecting all the uncertain factors in 
the system and y is a. scalar indicating the system performance or reliabil¬ 
ity (in what follows, we will simply refer to y as the performance variable). 
A typical example is the structural design problems, in which y can be the 
stress or the deformation. In this setting, the key task is to accurately assess 
and quantify the uncertainty in the performance parameter y. A challenge 
here is that real-world applications demand various statistical information of 
the performance y. for example, in robust design, the interests are mainly in 
the lower moments, especially the mean and the variance ra. in reliability 
analysis, it is mainly the tail probability IZH, in risk management, one can be 
interested in the tail probability as well as some extreme quantiles [22] , and in 
utility optimization, the complete distribution of the performance parameter 
is required [H]. To this end, a unihed solution is to acquire the knowledge of 
the probability distribution of the performance parameter, which provides a 
complete characterization of the uncertainty in it. In theory, the distribution of 
y can be estimated by crucial Monte Carlo (MC) simulations, provided that a 
sufficient number of samples can be afforded. In reality, however, the function 
g \ yi ^ y generally admits no analytical form, and evaluating function 5 f(x) 
must be done by performing computer simulation of the underlying system, 
which renders estimating the distribution of y with crucial MC impractical. 

The main purpose of this work is to provide an efficient method to compute the 
full distribution of y. The proposed method has two major ingredients. First, 
we propose to sample the distribution of y with the multicanonical Monte 
Carlo (MMC) method, which can be regarded as a more efficient alternative 
to MC. The MMC method was initially developed by Berg and Neuhaus [6117] 
to explore the energy landscape of a given physical system, and later it has 
been adopted to simulate rare events, such as transmission errors in opti¬ 
cal communication systems II5EB] , and the rare growth factors in random 
matrices na. Roughly speaking, the MMC method constructs an iterative 
procedure that generates samples forming a flat histogram in the space of the 
parameter of interest (i.e., the energy in the original problem setup). As will 
be shown in Section the MMC method often requires to iterate many times 
and in each iteration it employs Markov chain Monte Carlo (MCMC) simu- 
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lations to draw a rather large number of samples. As a result, the direct use 
of MMC to sampling the distribution of the performance can still be compu¬ 
tationally demanding, especially for systems with computationally intensive 
models. To this end, the second major component of our method is to employ 
computationally inexpensive surrogates to further reduce the computational 
cost of MMC. In particular, building on the method proposed in the work HU, 
we adaptively construct local Gaussian process (GP) surrogates in the MGMG 
iteration. We choose to use the this method for the following reasons: hrst, the 
surrogate construction scheme is naturally incorporated in the MGMG itera¬ 
tions, which makes it convenient to use; secondly, unlike many other surrogate 
based algorithms which introduce errors in the equilibrium distribution, this 
method samples asymptotically from the exact distribution of interest cu¬ 


lt should be noted that the purpose of the MMG method differs from that 
of the advanced sampling techniques developed in the held of reliability anal¬ 
ysis or rare event simulations, such as the cross entropy method [IH], subset 
simulations |2], sequential Monte Garlo p], etc. Namely, the purpose of those 
methods is to provide a variance-reduced estimator for for a specihc parameter 
associated with the distribution of y, while that of our method is to estimate 
the distribution of y itself. As will be shown in the next section, MMG is par¬ 
ticularly useful for this purpose, which is our primary motivation to choose 
MMG over other advanced sampling schemes. 


The rest of this paper is organized as the following. We hrst review the MMG 
method in Section and then present our local GP construction algorithm in 
Section 3. Finally numerical examples are provided in Section]^ to demonstrate 
the ehectiveness of the proposed method. 


2 The multicanonical Monte Carlo method 


In this section we introduce the MMG algorithm, largely following the pre¬ 
sentation of [S] . We start by summarizing the basic setup of our problem. Let 
X be a random vector taking values in the state space X, and y = g(x.) be 
a real scalar function of x. For simplicity we assume that both x and y are 
continuous random variables whose probability density functions exist. We 
further assume that the PDF p(x) of x is known, possibly up to an unknown 
normalization constant, and our goal is to determine the PDF TT{y) of y. 
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2.1 Flat histogram importance sampling 


A popular strategy to estimate the PDF of a continuous random variable y 
with simulation, is to approximate the PDF with histograms, like a special 
case of the kernel density estimation. Suppose we are interested in the PDF 
of 1 / in a given closed interval Ry, and we hrst equally decompose Ry into M 
bins of width A centered at the discrete values {6i,..., &m}- We dehne the i-th 
bin as the interval Bi = \bi — and the probability for y to be in Bi 

is Pi = P{?/ G Bi}. The PDF of y at point yi then can be approximated by 


if A is sufficiently small. This binning implicitly dehnes a partition of the 
input space X into M domains {Di}fL.^, where 


A = {x e X : 5 ((x) e Bi] 


is the domain in X that maps into the i-th bin Bi. See Fig. |2.1 for an illustra¬ 
tion. Note that, while A are simple intervals, the domains Di are multidimen¬ 
sional regions with possibly tortuous topologies. As a result, the probability 
Pi can be re-written as an integral in the input space: 

= Id = J /Di(x)p(x)cia: = E[A(x)], (2.1) 

where /d,(x) is an indicator function dehned as. 


^D,(x) 


1 X G A; 

0 otherwise. 


Now suppose that N samples {xi,... ,xjv} are drawn from the distribution 
p(x), possibly with MCMC, Pi can be evaluated with the MC estimator: 

= ( 2 . 2 ) 

where A is the number of samples that fall in bin A- 


As is well known, standard MC simulations have difficulty in reliably esti¬ 
mating the probabilities in the tail bins. The technique of importance sam¬ 
pling (IS) can be used to address the issue. Namely we choose a biasing dis¬ 


tribution g(x) and re-write (2.1) as 


P. = j /D.(x)l^|«(x)<ix = E-|/d.(A')u.(A')] 


(2.3) 
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Fig. 1. Schematic illustration of the connection between Bi and D*. 

where w{x.) = p(x)/g(x) is the IS weight, and E* indicates expectation with 
respect to the biasing distribution ^(x). It follows that the IS estimator of Pi 
becomes 



where the samples {xi,... ,XAr} are now drawn from the biasing distribution 
g(x), and N* is the number of samples falling in region Di. For conciseness, we 
let H* = The intuition behind IS is that, the biasing distribution should 
assign higher probability in the region of interest than the original one, and 
thus it can draw more samples in that region. 

The key of IS is to choose an appropriate biasing distribution g(x) that can 
help to achieve the objective of the simulation. Unlike regular IS methods 
which usually employ biasing distributions that are easy to sample from, the 
MMC method chooses a biasing distribution q{x.) in the form of: 

f hh X e B- 

,(x) = e(x) ' (2.5) 

[ 0 X ^ T). 


where 0(x) = 0^. For g(x) to be a well-dehned distribution, we must have 
-Pi/0i=l- It is easy to see that the distribution given in Eq. (2.5) assigns 
a constant weight to all x G /Dp w{'x.) = Wi for ^ Di where Wi = 0j, which 
is referred to be as uniform-weight (UW). In particular, if we let 0j = MPi 
for all X G /Dpi = 1, ...,M. the biasing distribution in Eq. (2.5) assigns equal 
probability to each bin and zero probability for any region outside D = U^^/Dj, 
namely, 

P* = P* = ...Pm = 1/M, where P* = J lD.{x)q{x)dx. (2.6) 
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We say such a biasing distribution as to be flat-histogram (FH). FH is an 
important feature for our purpose which is to have a good estimate of Pi for 

alH = 1... M. 


2.2 Multicanonical Monte Carlo 


It is easy to see, however, that the actual UW-FH distribution presented in 
Section [2?T] can not be used directly, as ©j depend on the sought after unknown 
Pi. The MMC method addresses the issue in an incremental manner. Simply 
speaking MMC iteratively constructs a sequence of distributions 




P(x) 

0fc(x)’ 

0 


X e F); 

X ^ D. 


(2.7) 


where 0 a:(x) = 0^,* for x G Di, converging to the actual UW-FH distribution. 
Specifically the sequence usually starts with go(x) where 0o,i = p for all i = 
1,..., M and p = Pi < 1 is the probability that y falls in the region of 
interestp~| The iteration is then guided by the following equation: 


p* 



Id, p(x)dx 
CsOi 


Jl. 

ce0i’ 


( 2 . 8 ) 


or equivalently Pj = PiQi. Namely, in the fc-th iteration, one first draws 
N samples from the current distribution qki^), and then updates 

{Qk+i,i}ili using the following formulas, which are derived from Eq. (2.8), 


Nt ■ 

TJ _ 

J J. U "j — • 

(2.9a) 

2 — k i ^ k j ^ 

(2.9b) 


(2.9c) 


where ^ is the number of samples falling into region Di in the k-th iteration. 
We reinstate that, unlike a usual IS method, which often chooses a biasing 
distribution easy to sample from, the biasing distribution of the MMC method 
Eq. (2.7) is not a standard distribution, and thus directly sampling from the 
distribution is rather difficult. To this end, MMC usually employs MCMC 
algorithm to draw samples from qki'x). Formal convergence analysis, as well 
as possible improvements of the method are not discussed in this work, and 


interested readers may consult, e.g.. 


and the references therein. 


^ In practice, it is often convenient to assume that p ~ 1 and in this case we have 
qo{x) pep{x). 
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3 Accelerating MMC with local GP surrogates 


In the MMC iteration, the main computational cost arises from performing the 
MCMC iteration to draw samples from each Qki'x.), for each sample requires a 
full-scale simulation of the underlying system. Thus, the MMC efficiency can 
be signihcantly improved by using computationally inexpensive surrogates in 
the MCMC scheme. As is mentioned in Section 1, here we adopt the adaptive 
surrogate construction scheme developed in HU. In the work, the authors 
presented their method with two different surrogate models: the quadratic 
regression and the Gaussian processes, and their numerical results suggest that 
the GP model has better performance. We thus choose to use the GP model, 
while noting that other types of surrogates can also be used. In this section, 
we hrst briefly introduce the GP surrogate and then present the adaptive 
surrogate construction scheme modihed for our specihc use in MMG. 


3.1 Gaussian process regression 


The GP surrogates, which are also known as kriging, have been widely used in 
many practical problems (see e.g., 121 ). The GP surrogate constructs the ap¬ 
proximation of 5 '(x) in a nonparametric Bayesian regression framework [T^IE^ . 
Specihcally the target function g{-x.) is cast as 

^(x) =/io(x)+? 7 (x) (3.1) 

where )no(x) is a real-valued function and e(x) is a zero mean Gaussian process 
whose covariance is specihed by a kernel A(x, x'), namely, 

GOV[r 7 (x),r 7 (x')] = A(x,x'). 

In practice, /io(x) can be represented as a linear or a quadratic polynomial 
whose coefficients can be determined by simple regression. In this work, we 
assume it is a quadratic polynomial. The kernel A(x, x') is positive semideh- 
nite and bounded. Popular choices of the covariance functions include squared 
exponential, exponential, and Matern. The hyper-parameters inside the covari¬ 
ance functions can be prescribed or determined by maximizing the marginal 
likelihood function. Suppose that N computer simulations of the function 
5 '(x) are performed at parameter values X* := [x^,.. .x*], yielding function 
evaluations y* := [j/^,... y*], where 

y*i=9{^i) for i = l,...,n. 
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Suppose we want to predict the function values at a given point x, i.e., y 
gi'x), the posterior of which is Gaussian: 


2 / I X, X*, y* ~ f7V:(/i(x), cr2(x)). (3.2) 

The posterior mean of y is 

/i(x) = /io(x) + X(x,X*)^X(X*,X*)-'(y* -po(X*)), (3.3a) 

and the posterior variance is 

a2(x) = X(x, x) - X(x, X*)'^ X(X*, X*)"^ X(X*, x), (3.3b) 


where the notation K{A, B) to denote the matrix of the covariance evaluated 
at all pairs of points in set A and in set B [21]. Eq. (3.3a) can be used as the 


surrogate to predict the function values at points of interest, and Eq. (3.3b) 
provides a measure of conhdence in the predicted values. 


3.2 Local GP construction 


In the standard GP methods, the surrogates are constructed with all the data 
points. Gonstructing the GP surrogate can be very costly when the data set 
becomes large, as it involves inverting a large covariance matrix. On the other 
hand, it has been well noted that data points far from the point of interest have 
little influence on the prediction (assuming the usual choices of covariance). 
Thus, a natural choice is to construct GP only with the data points near the 
point of interest. The resulting surrogate is thus local, in the sense that it is 
only intended to be accurate at the point of interest. Next we discuss in detail 
how to construct a local GP surrogate at point x given a collection of model 
evaluations: S := {(x*, j/i)}((f;^ where yi = g{^i) for i = I...U 5 . 

First we need to determine how many data points we want to use in the 
surrogate construction. Following the suggestion of m , we choose the number 
of data points n as 

^ = \fdx{dx + l)(dx + 2)/2, 

where is the dimensionality of x. This choice allows us to have sufficient 
data points to perform a quadratic regression for /ro(x). The specihc points 
used to build the surrogate are chosen with the nearest neighbor (NN) method: 
namely, we use the n points closest to x to construct the GP surrogate. It has 
been pointed out that the NN method only provides a suboptimal point selec¬ 
tion, and better selection strategy can be obtained by solving an optimization 
problem. However, in our problem, the GP construction must be done repeat¬ 
edly in the MGMG scheme, and as result even very fast optimization may 




significantly increase the total compntational cost. In this respect, we nev¬ 
ertheless adopt the NN method for the sake of compntational simplicity. In 
what follows, we refer to a local GP snrrogate constrncted with the prescribed 
procednre, as ^(x|S). 


3.3 MCMC with local GP surrogates 


In this section, we present a modihed version of the local snrrogate accelerated 
MCMC scheme developed in m The method embeds an adaptive snrrogate 
constrnction in the MCMC iteration: in each iteration the method constrncts 
a local snrrogate nsing data set S, for the proposed point and the cnrrent 
point, and decides whether it needs model rehnement; when rehnement is 
needed, the algorithm then rehnes the snrrogate by evalnating more points 
near the proposed point or the cnrrent one depending on where the rehnement 
is triggered; all the evaluated points are included in the data set S which will 
be used for constructing surrogates in the next step. In m, rehnement is 
triggered by either of two criteria. The hrst is random: with probability 7 t, 
the model rehned at either the current point or the proposed point. We follow 
the random criterion in our algorithm as it is the essential for the theoretical 
convergence of the algorithm. The second criterion used in m, intended to 
make the algorithm efficient in practice, is based on an error indicator of the 
acceptance probability. In this work, we follow the random criteria and choose 
7 t to be a constant for simplicity. We use, however, a diherent practical 
criterion, taking advantage of the special structure of the target distribution 
gfc(x) in Eq. (2.7). Namely, it is easy to see that, for qki'x.) in Eq. (2.7), an error 
in the surrogate does not cause an error in the acceptance probability unless 
the surrogate assigns the sample into a wrong bin, assuming a symmetric 
proposal distribution. Specihcally, suppose the current sample is x“ and the 
proposed sample is x+, and the posterior mean and variance of the GP at x*^ 
are y~^ and respectively. Suppose it is assigned to bin Bi = [6*— A/2, bi+A/2] 
based on the predicted value y^, and the probability that the assignment of 
Xj is incorrect can be computed as 


:= P[5f(x'*’) < bi — A/2ot g{x.^) > h + A/2] 

= - A/2, y+, e) - + A/2, y+, e) 1, (3.4) 

where <h(-, j/’*', e) is the cumulative density function (CDF) of a normal dis¬ 
tribution with mean y~^ and standard deviation e. Thus we can dehne the 
rehnement criteria as that the misassignment probability f3 is smaller than 
a threshold value: (3 < /dmax- Since the rehnement criteria is applied to each 
iteration, the probability that the acceptance probability computed with the 
surrogate is erroneous is bounded by 2/3jnax, in any iteration. As a result, 
to achieve this probability boundedness, we only need to check if x+ satishes 
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the quality condition: /3(x+) < /3max, as x“ has been verihed in the previ¬ 
ous iteration. We outline our algorithm in Algorithm 1, where the surrogate 
construction is integrated into a standard Metropolis-Hastings (MH) MCMC 
scheme. 

Algorithm 1 Metropolis-Hastings with local GP surrogates 

1 : for t = 1,..., T do 

2 - (Xt-i-l, J/i+l, S^^i) i Aj(x^, J/i, Si, Q'(', 2 /^), fljYiin) 

3: end for 
4: 

5: procedure A't(x“, ?/“, S, g(-; ?/"), 7 , amin) 

6 : Draw proposal x+ n(x-,-) 

7: {y+,e+) 

8 : if M ~ Uniform(0, 1) < 7 then 

9: = fi'(x+) 

10 : S 4 -SU{(x+, 2 /+)} 

11 : else 

12: /?^l + $(?/i,2/+,e)-$(2/2,2/+,e+) 

13: if /3 > /3max then 

14: y+ = g{x+) 

15: S ^ S U {(x+,|/+)} 

16: end if 

17: end if 

18: a ^ g(x+;|/-)/g(x-;|/-) 

19: if M ~ Uniform(0,1) < a then 

20 : return (x+, J/+, S) 

21 : else 

22 : return (x“, j/“, S) 

23: end if 

24: end procedure 


We have the following remarks regarding the proposed algorithm, highlighting 
its differences from that given in m in addition to the refinement criteria. 


As a pre-processing of the first MMC iteration, we choose Uo points, and use 
them as the initial data set S. These points can be chosen in many different 
ways: sampling according to p(x), Latin hypercube, or experimental design 
methods. For the succeeding MMC iterations, the data set S is simply taken 
to be that obtained in the previous round. 

Unlike regular MCMC methods, in each iteration our algorithm returns the 
sample x* as well as the function value yt for the sample. Note that, the 
function valnes are needed in Eqs. (2.9), and thus by recording the function 
values, we can compute Eq. (2.9) without evaluating the fnnction again. 

As has been discussed in the beginning of Section 3.3, in each iteration we 
only need to consider the quality of the surrogate at the proposed point 
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x"*" thanks to the special structure of while in the original algorithm, 

both x"*" and x“ need to be examined. 

• In our algorithm, when model rehnement is needed, we simply evaluate 
the current point x'*". It has been suggested that this strategy may lead to 
poor conditioned regression in particular when polynomial surrogates are 
used, and as an alternative a space hlling approach is used in mi- However, 
we have found it is not a very serious issue for the GP surrogates in our 
numerical tests, and, considering that the space hlling method requires an 
extra optimization step, we choose to directly evaluate x”*" for simplicity’s 
sake. 

Finally we note that it is an very interesting problem to analyze the conver¬ 
gence property of the algorithm. To this end, the convergence analysis in HU 
can provide certain useful results of the MCMC iterations. However, since the 
algorithm is a combination of the two methods, a formal convergence analysis 
can be very challenging, and so is not pursued in this work. 


4 Numerical examples 


We use three numerical examples to demonstrate the performance of the pro¬ 
posed GP accelerated MMG (GP-MMG) method. Before proceeding to the 
examples, we describe the specihc GP surrogate used in all the three exam¬ 
ples. First in all the examples we use an anisotropic covariance function in the 
form of: 


K(x, x') = a exp 


dx 

E 

2 = 1 


Xi 


X':\ 


(4.1) 


where p is a prescribed positive integer which usually takes values of 1 (the 
exponential kernel) or 2 (the squared exponential kernel), the coefficient a is 
determined with empirical Bayes in the iteration, and the correlation length 
1 = is determined from the initial data set and is not adjusted 

in the iteration. Note that, the correlation length 1 can also be determined 
with empirical Bayes in the iteration if desired, but we choose not to do so 
here for simplicity’s sake, as it requires to numerically solving an optimization 
problem. 


4-1 A multi-dimensional analytical example 


Our hrst example is a multi-dimensional problem where the performance func¬ 
tion is 

p(x) = min{pi(x), P 2 (x)} - 1, 
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Fig. 2. The histograms of the first two steps and the 10th iteration of MMC. 
with 

5 fi(x) = ||x - xill , and 5 ( 2 (x) = ||x - X 2 II . 

The input x are multidimensional independently distributed standard nor¬ 
mal random variables and xi,X 2 are two hxed points. It is obvious that each 
Di has two possibly disjoint sections: {x| 5 fi(x) G Bi] and {x| 5 f 2 (x) G i?i}, 
which makes the problem challenging for many variance-reducing sampling 
techniques. 


We hrst test our method for the two dimensional case and choose xi = (3, 3) 
and X 2 = (3, —3) respectively. We run standard MC simulations with 10^ sam¬ 
ples, and use its results as the “truth” to validate the estimates of the MMC 
methods. In the hrst numerical experiment, we perform MMC simulations 
without using surrogates, where 10 iterations are used with 10 ® samples in 
each iteration, resulting in a total computational cost of 10 ® full-model sim¬ 
ulations. When constructing the PDF, we use Ry = [—1,54] which is divided 
into 55 bins. In Fig. 4T we show the histograms obtained in the 1st, 2nd and 
the hnal MMC iteration, from which one can see that the histograms tend to 
become hat as the iterations proceed. 


Our second numerical experiment is to run MMC with the assistance of the 
GP surrogates, and, as is in the hrst experiment, we again use 10 iterations 
with 10® samples in each. In the GP-MMC computation, we construct the GP 
surrogates as is described in the beginning of the section, where the kernel 
is given by Eq. (4.1) with p = 1. The initial data set contains 50 samples 
randomly drawn from the distribution of x, and we choose the random model 
rehnement probability 7 * = lO”"^. The key parameter in the algorithm is the 
maximum misassignment probability /Smax, and to examine the robustness of 
our method against the choices of /dmax, we implement our method with various 
values of d m ax and show the results in Table [Tj In particular, for the results 
of each value of d m ax , we show the number of true model evaluations, the 
maximum and the average relative errors (compared to the MG results) of all 
the bins. 


One can see that, the method performs well even for a very large misassign¬ 
ment probability, and the results are rather robust for different values of dmax 
except that the number of true model evaluations grows as dmax becomes 
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/3max 

0.92 

0.76 

0.32 

0.05 

0.003 

plain MMC 

true model evals 

796 

810 

809 

926 

1089 

10® 

maximum RelErr 

0.1775 

0.148 

0.1058 

0.1321 

0.1217 

0.0921 

average RelErr 

0.0419 

0.039 

0.0327 

0.0333 

0.0345 

0.0225 


Table 1 


(Example 1) The performance results of GP-MMC with various values of /Smax- 



Fig. 3. (Example 1) Top: the PDF of y obtained by MC (circles), MMC (dashed 
line) and GP-MMC (solid line) on a logarithmic scale; inset is the same plots on a 
linear scale. Bottom: the relative error in the PDF obtained by MMC (dashed) and 
GP-MMC (solid). 

smaller. 

To further compare the results, we plot the PDF obtained by MC, MMC and 
GP-MMC with /3max = 0.05, in Fig. (Top), and one can see that the results 
of the three methods agree very well with each other. To have a quantitative 
assessment of the performance, we compute the relative error of the MMC and 
the GP-MMC estimates, against the results of plain MC: 

RelEiTMMC = RelErrcPMMC = (4.2) 

pMC VMC 

and show the results in Fig. (Bottom). 

We see that, the relative errors in both MMC and GP-MMC are around 0.1, 
indicating that both MMC and GP-MMC produce reliable estimates of the 
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PDF of y. To further compare the performance, we computed the mean, the 
variance, the 3rd, the 4th and the 5th central moments of y using the samples 
obtained by the three methods shown in Table which shows that the results 
obtained by the three methods agree well with each other. Regarding the 
computational cost, the MMC method uses 10® full model evaluations while 
our GP-MMC method only uses less than a thousand full-model evaluations. 


moment 

mean 

var 

3rd 

4th 

5th 

MC 

14.21 

43.58 

217.42 

7340.55 

108583.52 

MMC 

14.43 

44.04 

241.11 

7505.02 

113171.57 

GP-MMC 

14.28 

44.04 

230.10 

7456.33 

111877.63 


Table 2 

(Example 1) The mean, variance, and 3rd-5th central moments of y, estimated by 
MC, MMC and GP-MMC . 

We also consider the performance of the proposed method with respect to 
different sample sizes and dimensionality. To this end, we hrst perform the 
GP-MMC method as well as standard MMC with different number of samples 
in each iteration, in which /Smax is taken to be 0.075. The results are shown 
in Table and as expected, with more samples in each iteration, the results 
become more accurate at the price of more true model evaluations. Next, 
we consider the example with different number of dimensions. In this case 
we let xi = (1,...,!)'^ and X 2 = (—1,...,—!)'^ for d = 2, 8, 16. We perform 
both MMC and GP-MMC in each case, and in the GP-MMC we take /dmax = 
0.075 and the number of sample size in each iteration to be 5 x 10^. The 
performance comparison is shown in Table We can see from the results 
that as the dimensionality increases, the GP-MMC method requires more true 
model evaluations, but the computational cost saving compared to standard 
MMC is still signihcant even for the case of 16 dimensions. Overall we have 
found that the performance of the GP-MMC method is rather robust with 
respect to the sample size and the dimensionality. 



sample size 

le-|-4 

le+5 

le-h6 

MMC 

maximum RelErr 

0.3097 

0.1858 

0.0466 

average RelErr 

0.0791 

0.0387 

0.0120 


true model evals 

891 

1855 

2033 

GP-MMC 

maximum RelErr 

0.2593 

0.0906 

0.0576 


average RelErr 

0.087 

0.0328 

0.0147 


Table 3 

(Example 1) The performance of MMC and GP-MMC with respect to various sam¬ 
ple sizes. 
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dimension 

2 

8 

16 

MMC 

maximum RelErr 

0.1173 

0.1168 

0.1531 

average RelErr 

0.0225 

0.0370 

0.0566 

GP-MMC 

true model evals 

891 

3226 

16886 

maximum RelErr 

0.1497 

0.1521 

0.1692 


average RelErr 

0.0414 

0.0422 

0.0665 


Table 4 

(Example 1) The performance of MMC and GP-MMC with respect to varions nnm- 
bers of dimensions. 

Jt-.2 Cantilever beam 


We now consider a cantilever beam problem [T81I25] as illustrated in Fig. 4.2 
with width w, height t, length L, and subject to transverse load Y and hori¬ 
zontal load X. This is a popular benchmark problem in the reliability analysis 
literature, where the performance function is 


y = 



Ewt 


which represents the deflection of the beam. In this example, we assume that 
the beam length is fixed L = 100, and the beam width w, the height x, the 
applied loads X and Y, as well as the elastic module E of the material, are 
random parameters. We further assume that these random parameters are 
all independently distributed, with each following a normal distribution. The 
means and the variances of the parameters are summarized in Table 


parameter 

w 

t 

X 

Y 

E 

mean 

4 

4 

500 

1000 

2.9 X 10® 

variance 

0.001 

0.0001 

100 

100 

1.45 X 10® 


Table 5 

(Example 2) The mean and variance of the random parameters in the cantilever 
beam model. 


In this example, we also compute the PDF of y with three methods: plain 
MC, MMC and GP-MMC. In the MC simulations, we use 10® full model 
evaluations. In both MMC and GP-MMC, we use 10 iterations where 10^ 
samples in each iteration. In the GP-MMC computation, the number of initial 
data and the values of 'jt are the same as those used in the first example. 


The GP kernel is also given by Eq. (4.1) with p = 1. Also, we test the GP- 
MMG method with various values of /dmax and show the results in Table 
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Fig. 4. (Example 2) Schematic illustration of a cantilever beam subject to horizontal 
and vertical loads. 

In this example, we use Ry = [0.56, 0.66] which is divided into 40 bins. To 
compare the results, we plot the PDF obtained by MC, MMC and GP-MMC 
with /3max = 0.32 which requires 4775 true model evaluations, as well as the 
relative errors of MMC and GP-MMC, in Figs. We also show the same 
moment plots as is in the first example in Table All the figures indicate 
that our GP-MMC method yields very reliable estimates of the PDF of y, 
while its computational cost is significantly lower than both MC and standard 
MMC. 

/3max 0.92 0.76 0.32 0.05 0.003 

true model evals 894 2523 4775 7456 7589 

maximum RelErr 0.116 0.143 0.102 0.099 0.089 

average RelErr 0.034 0.042 0.038 0.038 0.037 

Table 6 

(Example 2) The performance results of GP-MMC with various values of /3max- 
moment mean var 3rd 4th 5th 

MC 0.6024 8.99e-5 6.28e-8 2.43e-8 4.96e-ll 

MMC 0.6024 8.97e-5 3.04e-8 2.43e-8 3.32e-ll 

GP-MMC 0.6025 9.04e-5 7.55e-8 2.46e-8 5.54e-ll 

Table 7 

(Example 2) The mean, variance, and 3rd~5th central moments of y, estimated by 
MC, MMC and GP-MMC . 

4-3 Random PDE example 


Finally we consider a random partial differential equation (PDE) example: a 
two-dimensional Poisson equation on region P = [0,1] x |0,1|: 

V(o(x)Vti(x)) = /(x), (4.3a) 

u = 0 on 9P, (4.3b) 

where a(x) is a random field and cIP is the boundary of P. We want to compute 
the statistical distribution of the value of u at location x* G P. A physical 


16 


















Fig. 5. (Example 2) Top: the PDF of y obtained by MC (circles), MMC (dashed 
line) and GP-MMC (solid line) on a logarithmic scale; inset is the same plots on a 
linear scale. Bottom: the relative error in the PDF obtained by MMC (dashed) and 
GP-MMC (solid). 



Fig. 6. (Example 3) The eigenvalues of the KL expansion plotted in a descending 
order. 

interpretation of the problem is the following: we consider a steady flow in an 
isotropic aquifer subject to random permeability [1], and we are interested in 
the statistical information of the hydraulic head at a particular location x*. 

We further assume the permeability is a log-normal random held, namely, 
a(x) = aoexp( 2 ;(x)) where z(x) is a Gaussian random held with zero mean 
and covariance kernel, 


S(xi,X2) 


exp(- 


|xi -X2I 

A 


-)• 


(4.4) 
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Fig. 7. Left: a randomly drawn coefficient sample a{x). Right: the solution of 
Eq. (4.3) associated with a{x). 


In this example we take oq = 1 and A = 0.6. In practice, the random field 
z(x) in the PDE is often represented with a truncated Karhunen-Loeve (K-L) 
expansion. Namely, let be the eigenvalue-eigenfunction pairs of 

the covariance kernel S(-, •) such that Xj > Aj+i for all j = I... 00 , and we can 
approximate z{x.) with 


= ^CjV^^j(x), (4.5) 

i=i 

where c = (ci,...,cj) follows a standard isotropic normal distribution. Thus 
the dimensionality of the problem is reduced to J and in this example we 
choose J = 10. We plot the eigenvalues associated with the 10 KL modes in a 
descending order in Fig. which suggests that 10 KL-modes can sufficiently 
represent the Gaussian held z(x) in this problem. Moreover, in the numerical 
simulations, we take /(x) = 1 and x* = (0.5, 0.5). A sample coefficient a(x) 
and the associated solution m(x) is shown in Fig. |4.3 


As the computational cost for solving Fq. (4.3) is rather high, which renders 
standard MC unfeasible, we choose to only perform MMC and GP-MMC 
simulations in this problem. In both cases, we use 10 iterations with 20000 
samples in each iteration. In GP-MMG, we use the covariance function (4.1) 
with p = 2. The number of initial samples is 400, 7 t = 10 and /^max = 
0.05. As a result the total number of true model evaluations is 4885. When 
constructing the PDF, we use Ry = [—2, 0] divided into 20 bins. We plot the 
PDF computed with MMG and GP-MMG as well as the relative error in the 
two results in Fig. One can see from the hgures that the results of GP- 
MMG agree very well with those of plain MMG, while it only uses around one 
fortieth true model evaluations of the plain MMG. 
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Fig. 8. (Example 3) Top: the PDF of y obtained by MMC (dashed line) and 
GP-MMC (solid line) on a logarithmic scale; inset is the same plots on a linear 
scale. Bottom: the relative error in the PDF. 

5 Conclusions 


We consider a special type of UQ problems where the system performance 
is characterized by a scalar parameter. We propose to nse a MMC based 
method to compnte the distribntion of the performance parameter, and we 
also propose to nse a local GP snrrogate to accelerate the MMC simnlations. 
Based on the work m, we design an adaptive algorithm that can effectively 
rehne the GP snrrogate in the MMC iterations. With numerical examples, 
we demonstrate that the proposed GP-MMC method can efficiently and ac¬ 
curately compute the distribution of the performance parameter. We expect 
the proposed method can be useful in various helds of applications, such as 
reliability analysis, risk management, and utility optimizations. 

There are a number of possible improvements and extensions of the proposed 
method that we plan to investigate in the future. First there are some well- 
known open issues with GP: most notably, how to choose the best covariance 
functions, and such a choice may certainly affect the performance of our MMC- 
GP method. To this end, we hope to develop approaches that can effectively 
choose the covariance functions for our MMG method. Second, as has been 
mentioned in Section 3, we are not able to provide a convergence analysis of 
the proposed method in this paper and we hope to address the issue in a future 
work. Third we are also interested in more general uncertainty propagation 
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problems where the output is a multidimensional vector rather than a scalar. In 
this case, the standard MMC scheme does not apply directly, due to the multi¬ 
dimensionality of the output. We plan to tackle such problems with modified 
MMC algorithms. Finally, we note that the Wang-Landau algorithms, which 
can be regarded as a variant of MMC, have been applied to the Bayesian 
inference problems (e.g. [IQ]), and we hope that our GP-MMC method can 
be applied to such problems as well. In this case, we expect that our method 
can further improve the computational efficiency of the Bayesian inferences, 
thanks to the use of surrogates. 
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